Data Exploration

Read the data in using readxl::read_xlsx. What other ways can we read this data in?
Data can be read in depending upon the file type. So, for an excel file, we can use -
read.excel
read.xls
read.table


What type of data is in the magda_cleanedmiRNAdata.xlsx file? How is it structured?
The data shows micro RNA levels detected from different tissues obtained at different gestational ages of several differnt foetuses.


How many “variables” are present?
There are 27 variables present, namely -
Reading data in

Variables present:

## # A tibble: 27 x 1
##    ...1                   
##    <chr>                  
##  1 lamID                  
##  2 gscID                  
##  3 caseID                 
##  4 tissue                 
##  5 condition              
##  6 conditionEx            
##  7 GA                     
##  8 trimester              
##  9 sex                    
## 10 processingTime         
## 11 libraryID              
## 12 flowCell               
## 13 Lane                   
## 14 indexAdapter           
## 15 MLPA                   
## 16 misoprostol            
## 17 modeLabour             
## 18 medicalTA              
## 19 totalReadsAligned_miRNA
## 20 propFullyIn_miRNA      
## 21 propPartiallyIn_miRNA  
## 22 totalReadsAligned_piRNA
## 23 propFullyIn_piRNA      
## 24 propPartiallyIn_piRNA  
## 25 RQS                    
## 26 Degraded               
## 27 450k.array

How many patients are present?
There are a total of 106 patients present, of which

45 are female:

61 are male:

Hence,
106 (word: MALE) - 45 (word: FEMALE)
= 61


How many samples are present?
- There are 106 total patients.
- Number of miRNAs =
2822 (total rows) - 29 (variables plus initial empty rows)
= 2793
Hence,
2793 * 106
= 296,058


What is the breakdown of tissues / trimester? Generate a pretty table using kableExtra::kable(). Do this with at least 1 other variable of your choosing.
I decided to subset 5 variables -
- tissue
- extended sample condition information
- gestational age
- trimester
- sex


Summary Characteristics

Summary of the variables:

##              tissue                 conditionEx                  GA    
##  brain          :10   spina bifida        :27   22                :12  
##  chorionic villi:32   pPROM               :22   17                : 7  
##  kidney         :10   control             :18   18                : 7  
##  liver          :10   anencephaly         :10   21.6              : 7  
##  lung           :24   GU abnormalities    : 7   22.4              : 7  
##  muscle         :10   Lipomyelomeningocele: 5   20.399999999999999: 6  
##  spinal cord    :10   (Other)             :17   (Other)           :60  
##  trimester     sex    
##  1: 6      FEMALE:45  
##  2:90      MALE  :61  
##  3:10                 
##                       
##                       
##                       
## 

Since the data was all in a matrix form before, I converted it into a dataframe raw_miRNA.df:

raw_miRNA.df is now: 1. the transposed version of raw_miRNA
2. a dataframe
3. has column names
4. easier to view and understand


Moving all of the non-miRNA data (characteristics) into a separate df:

And only the miRNA expression level data into o_miRNA.df